Database Design and Normal Forms 

Database Design 

■ coming up with a 'good' schema is very important 

How do we characterize the "goodness" of a schema ? 
If two or more alternative schemas are available 

how do we compare them ? 
What are the problems with "bad" schema designs ? 

Normal Forms: 

Each normal form specifies certain conditions 
If the conditions are satisfied by the schema 
certain kind of problems are avoided 

Details follow.... 
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An Example 

student relation with attributes: studName, rollNo, sex, studDept 
department relation with attributes: deptName, officePhone, hod 

Several students belong to a department. 

studDept gives the name of the student's department. 



Correct schema: 

Student 



Department 



studName 


rollNo 


sex 


studDept 



deptName 



officePhone 



HOD 



Incorrect schema: 

Student Dept 



studName 


rollNo 


sex 


deptName 


officePhone 


HOD 



What are the problems that arise ? 
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Problems with bad schema 

Redundant storage of data: 

Office Phone & HOD info - stored redundantly 

■ once with each student that belongs to the department 

■ wastage of disk space 

A program that updates Office Phone of a department 

■ must change it at several places 

• more running time 

• error - prone 

Transactions running on a database 

■ must take as short time as possible to increase transaction 

throughput 
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Update Anomalies 

Another kind of problem with bad schema 
Insertion anomaly: 
No way of inserting info about a new department unless 
we also enter details of a (dummy) student in the department 

Deletion anomaly: 
If all students of a certain department leave 
and we delete their tuples, 
information about the department itself is lost 

Update Anomaly: 
Updating officePhone of a department 

• value in several tuples needs to be changed 

• if a tuple is missed - inconsistency in data 
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Normal Forms 




First Normal Form (INF) - included in the definition of a relation 


Second Normal Form (2NF) 






denned m terms 01 


Third Normal Form (3NF) 


> functional dependencies 


Boyce-Codd Normal Form (BCNF) > 




Fourth Normal Form (4NF) - defined using multivalued 




dependencies 


Fifth Normal Form (5NF) or Project Join Normal Form (PJNF) 


defined using join dependencies 
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Functional Dependencies 

A functional dependency (FD) X —> Y 

(read as X determines Y) (X Q R, Y Q R) 

is said to hold on a schema R if 

in any instance r on R, 

if two tuples t 1? t 2 (tj =£ t 2 , tj G r, t 2 G r) 

agree on X i.e. X x [X] = t 2 [X] 
then they also agree on Y i.e. t x [Y] = t 2 [Y] 

Note: If K C R is a key for R then for any AGR, 

A 

holds because the above if then condition is 

vacuously true 
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Functional Dependencies - Examples 

Consider the schema: 

Student ( studName, rollNo , sex, dept, hostelName, roomNo) 

Since rollNo is a key, rollNo — > {studName, sex, dept, 

hostelName, roomNo} 

Suppose that each student is given a hostel room exclusively, then 

hostelName, roomNo — > rollNo 

Suppose boys and girls are accommodated in separate hostels, then 

hostelName — > sex 

FDs are additional constraints that can be specified by designers 
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Trivial / Non-Trivial FDs 

An FD X — ► Y where Y c X 

- called a trivial FD, it always holds good 

AnFDX^Y where Y £ X 

- non-trivial FD 

An FD X -> Y where XnY = f 

- completely non-trivial FD 
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Deriving new FDs 

Given that a set of FDs F holds on R 

we can infer that a certain new FD must also hold on R 

For instance, 

given that X — > Y, Y — > Z hold on R 
we can infer that X — > Z must also hold 

How to systematically obtain all such new FDs ? 

Unless all FDs are known, a relation schema is not fully specified 



Prof P Sreenivasa Kumar 
Department of CS&E, IITM 



9 



Entailment relation 



We say that a set of FDs F 1= { X Y} 

(read as F entails X — > Y or 

F logically implies X — > Y) 
if in every instance r of R on which FDs F hold, 

FDX^Y also holds. 



Armstrong came up with several inference rules 
for deriving new FDs from a given set of FDs 



We define F + = {X Y | F l=X -> Y} 



F : Closure of F 
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Armstrong's Inference Rules (1/2) 

1 . Reflexive rule 

F 1= {X -> Y | Y c X} for any X. Trivial FDs 

2. Augmentation rule 

{X -> Y} 1= {XZ -> YZ}, Z c R. Here XZ denotes X U Z 

3 . Transitive rule 

{X -> Y, Y -> Z} 1= {X -> Z} 

4. Decomposition or Projective rule 

{X -> YZ} 1= {X -> Y} 

5 . Union or Additive rule 

{X -> Y, X -> Z} 1= {X YZ} 

6. Pseudo transitive rule 

{X -> Y ? WY -» Z} 1= {WX Z} 
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Armstrong's Inference Rules (2/2) 

Rules 4, 5, 6 are not really necessary. 

For instance, Rule 5: {X Y, X -> Z} 1= {X YZ} can be 



3) X — > XY Augmentation rule on 1 

4) XY — > ZY Augmentation rule on 2 

5) X — > ZY Transitive rule on 3, 4. 

Similarly, 4, 6 can be shown to be unnecessary. 
But it is useful to have 4, 5, 6 as short-cut rules 



proved using 1, 2, 3 alone 



1 

2 




given 
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Sound and Complete Inference Rules 

Armstrong showed that 

Rules (1), (2) and (3) are sound and complete. 
These are called Armstrong's Axioms (AA) 

Soundness: 

Every new FD X — > Y derived from a given set of FDs F 
using Armstrong's Axioms is such that F 1= {X — > Y} 

Completeness: 

Any FDX^Y logically implied by F (i.e. F 1= {X Y}) 
can be derived from F using Armstrong's Axioms 
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Proving Soundness 

Suppose X — > Y is derived from F using AA in some n steps. 
If each step is correct then overall deduction would be correct. 
Single step: Apply Rule (1) or (2) or (3) 
Rule (1) - obviously results in correct FDs 

Rule (2) - {X -> Y} t= {XZ -> YZ}, Z c R 

Suppose t 1? t 2 E r agree on XZ 
=> t 1? t 2 agree on X 

=^> t 1? t 2 agree on Y (since X — > Y holds on r) 
=^> t 1? t 2 agree as YZ 
Hence Rule (2) gives rise to correct FDs 

Rule (3) - {X -> Y, Y Z} 1= X -> Z 

Suppose t l5 1 2 G r agree on X 
=^> t 1? t 2 agree on Y (since X — > Y holds) 
^> t 1? t 2 agree on Z (since Y — > Z holds) 
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Proving Completeness of Armstrong's Axioms (1/4) 

Define X + F (closure of X wrt F) 

= {A | X — > A can be derived from F using AA}, A E R 

Claim 1: 

X — ► Y can be derived from F using AA iff Y Q X + 
(If) Let Y = {A 1? A 2? . . ., A n } . Y c x + 

=^> X — ► A { can be derived from F using AA (1 < i < n) 

By union rule, it follows that X — > Y can be derived from F. 

(Only If) X — > Y can be derived from F using AA 
By projective rule X — > A { (1 < i < n) 
Thus by definition of X + , A { G X + 

=> Y c X + 
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Completeness of Armstrong's Axioms (2/4) 

Completeness: 

(F 1= {X -> Y}) => X -> Y follows from F using AA 

We will prove the contrapositive: 

X — >Y can't be derived from F using AA 

=> F f£ {X -> Y} 

=^> 3 a relation instance r on R st all the FDs of 
F hold on r but X — > Y doesn't hold. 

Consider the relation instance r with just two tuples: 
X + attributes Other attributes 

t A ^ t A n 

r: 1 1 1 ...1 1 1 1 ...1 
1 1 1 ...1 0 0 0 ...0 
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Completeness Proof (3/4) 

Claim 2: All FDs of F are satisfied by r 

Suppose not. Let W — > Z in F be an FD not satisfied by r 

Then W Q X + and Z £ X + 

Let A G Z - X + 
Now, X — > W follows from F using AA as W c X (claim 1) 

X — > Z follows from F using AA by transitive rule 

Z — > A follows from F using AA by reflexive rule as A E Z 

X — > A follows from F using AA by transitive rule 

By definition of closures, A must belong to X + 

- a contradiction. r: 111...1111...1 

Hence the claim. 1 1 1...1 0 0 0...0 



X R-X 
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Completeness Proof (4/4) 

Claim 3 : X — > Y is not satisfied by r 
Suppose not 

Because of the structure of r, Y Q X + 
=^> X — > Y can be derived from F using AA 

contradicting the assumption about X — > Y 
Hence the claim 

Thus, whenever X — > Y doesn't follow from F using AA, 

F doesn't logically imply X — > Y 
Armstrong's Axioms are complete. 
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Consequence of Completeness of AA 

X + = {A | X — > A follows from F using AA} 
= {A | F 1= X A} 

Similarly 

F + = {X -> Y | F t= X -> Y} 

= {X — > Y | X — > Y follows from F using AA} 
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Computing closures 

The size of F + can sometimes be exponential in the size of F. 

For instance, F = {A — > B 1? A — > B 2 , , A — > B n } 

F + = {A^X} where X c {B 1? B 2 ,...,B n }. 

Thus |F + | = 2 n 

Computing F + : computationally expensive 

Fortunately, checking if X — > Y E F + 

can be done by checking if Y Q X + F 

Computing attribute closure (X + F ) is easier 
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Computing X F 

We compute a sequence of sets X 0 , X 1? . . . as follows: 

X 0 := X; // X is the given set of attributes 
X i+1 := X { U {A | there is a FD Y Z in F 

and A G Z and Y c XJ 

Since X 0 c X x c X 2 c ... c Xj c X i+1 c ...e R 
and R is finite, 

There is an integer i st X { = X i+1 = X i+2 =. . . 
and X + F is equal to X { . 
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Normal Forms - 2NF 

Full functional dependency: 
An FD X — ► A for which there is no proper subset Y of X 
such that Y — ► A 

(A is said to be fully functionally dependent on X) 

2NF: A relation schema R is in 2NF if 
every non-prime attribute is fully functionally dependent on any 

key of R 

prime attribute: A attribute that is part of some key 
non-prime attribute: An attribute that is not part of any key 
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Example 

1) Book (authorName, title, authorAffiliation, ISBN, publisher, 

pubYear ) 

Keys: (authorName, title), ISBN 

Not in 2NF as authorName — » authorAffiliation 

(authorAffiliation is not fully functionally dependent on the 

first key) 

2) Student (rollNo, name, dept, sex, hostelName, roomNo, 

admitYear) 

Keys: rollNo, (hostelName, roomNo) 
Not in 2NF as hostelName — > sex 

student (rollNo, name, dept, hostelName, roomNo, admitYear) 
hostelDetail (hostelName, sex) 
- There are both in 2NF 
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Transitive Dependencies 

Transitive dependency: 
An FD X — > Y in a relation schema R for which there is a set of 
attributes Z Q R such that 

X —> Z and Z — > Y and Z is not a subset of any key of R 

Ex: student (rollNo, name, dept, hostelName, roomNo, headDept) 
Keys: rollNo, (hostelName, roomNo) 
rollNo — > dept; dept — > headDept hold 
So, rollNo — > headDept a transitive dependency 

Head of the dept of dept D is stored redundantly in every tuple 

where D appears. 
Relation is in 2NF but redundancy still exists. 
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Normal Forms - 3NF 

Relation schema R is in 3NF if it is in 2NF and no non-prime 
attribute of R is transitively dependent on any key of R 

student (rollNo, name, dept, hostelname, roomNo, headDept) 
is not in 3NF 

Decompose: student ( rollNo , name, dept, hostelName, roomNo ) 

deptlnfo ( dept , headDept) 
both in 3NF 

Redundancy in data storage - removed 
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Another definition of 3NF 

Relation schema R is in 3NF if for any nontrivial FD X — > A 
either (i) X is a superkey or (ii) A is prime. 

Suppose some R violates the above definition 
=> There is an FD X — ► A for which both (i) and (ii) are false 

=^> X is not a superkey and A is non-prime attribute 

Two cases arise: 

1) X is contained in a key - A is not fully functionally dependent 

on this key 

- violation of 2NF condition 

2) X is not contained in a key 

K — > X, X — ► A is a case of transitive dependency 

(K - any key of R) 
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Motivating example for BCNF 

gradelnfo (rollNo, studName, course, grade) 

Suppose the following FDs hold: 

1) rollNo, course — > grade Keys: 

2) studName, course — > grade (rollNo, course) 

3) rollNo — > studName (studName, course) 

4) studName — > rollNo 

For 1,2 lhs is a key. For 3,4 rhs is prime 
So gradelnfo is in 3NF 

But studName is stored redundantly along with every course 
being done by the student 
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Boyce - Codd Normal Form (BCNF) 

Relation schema R is in BCNF if for every nontrivial 
FD X — > A, X is a superkey of R. 

In gradelnfo, FDs 3, 4 are nontrivial but lhs is not a superkey 
So, gradelnfo is not in BCNF 

Decompose: 

gradelnfo ( rollNo, course , grade) 
studlnfo (rollNo, studName) 

Redundancy allowed by 3NF is disallowed by BCNF 

BCNF is stricter than 3NF 
3NF is stricter than 2NF 
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Decomposition of a relation schema 



If R doesn't satisfy a particular normal form, 
we decompose R into smaller schemas 



What's a decomposition? 

R = (A 1? A 2 ,. . A n ) 
D = (R l9 R 2 ,...,R k ) stRi c RandR = Rj U R 2 U ... U R k 

(Rj's need not be disjoint) 
Replacing R by R 1? R 2 ,. . R k - process of decomposing R 



Ex: gradelnfo (rollNo, studName, course, grade) 
Rj : gradelnfo ( rollNo, course , grade) 
R 2 : studlnfo ( rollNo , studName) 
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Desirable Properties of Decompositions 

Not all decomposition of a schema are useful 
We require two properties to be satisfied 

(i) Lossless join property 

- the information in an instance r of R must be preserved in the 
instances r 1? r 2 ,. . .,r k where v { = 7i R . (r) 

(ii) Dependency preserving property 

- if a set F of dependencies hold on R it should be possible to 
enforce F by enforcing appropriate dependencies on each r { 
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Lossless join property 



F - set of FDs that hold on R 

R - decomposed into R 1? R 2 ,. . .,R k 

Decomposition is lossless wrt F if 

for every relation instance r on R satisfying F. 

R = (A, B, C); R x = (A, B); R 2 = (B, C) 



r: ABC 
a 2 Cj 
a 2 b 2 c 2 

a 3 ^1 C 3 
Lossy join 



r i : 



A B 

a i b i 

a 2 b 2 

a 3 bj 



r 2 : 



B C 

b i c i 

b 2 c 2 

bj c 3 



Spurious tuples 



r 1 * r 2 : 



Lossless joins 
are also called 
non- additive joins 



Original info 
is distorted 



1 

ABC 

a x c 2 
a x bj c 3 
a 2 b 2 c 2 
a 3 bj cj 
a 3 bj c 3 
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Dependency Preserving Decompositions 

Decomposition D = (R 1? R 2 ,. . .,R k ) of schema R preserves a set 
of dependencies F if 

(7l Ri (F) U % 2 (F) U . . . U 7l Rk (F)) + = F + 

Here, 7i R . (F) = { (X -> Y) G F + | X c R., Y c R.} 
(called projection of F onto 

Informally, any FD that logically follows from F must also 
logically follow from the union of projections of F onto R|'s 
Then, D is called dependency preserving. 
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An example 

Schema R = (A, B, C) 
FDs F = {A -> B, B -> C, C -> A} 

Decomposition D = (Rj = {A, B}, R 2 = {B, C}) 
7t R (F)={A^B,B^A} 
7U R2 (F)={B^C,C^B} 

(7l Rj (F) U %2 (F)) + = {A - B, B - A, 

B -> C, C -> B, 
A -> C, C -> A} = F H 

Hence Dependency preserving 
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Testing for lossless decomposition property(l/6) 

R - given schema with attributes A 1? A 2 , . . A n 
F - given set of FDs 

D - {R 1? R 2 , • • Rm} given decomposition of R 
Is D a lossless decomposition? 

Create anmx/i matrix S with columns labeled as A 1? A 2 , . . ., A n 
and rows labeled as R 1? R 2 , • • , Rm 

Initialize the matrix as follows: 

set S(i,j) as symbol b- for all 

if A is in the scheme R|, then set S(iJ) as symbol a^ , for all ij 
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Testing for lossless decomposition property(2/6) 

After S is initialized, we carry out the following process on it: 
repeat 

for each functional dependency U — > V in F do 
for all rows in S which agree on [/-attributes do 
make the symbols in each V- attribute column 
the same in all the rows as follows: 
if any of the rows has an "<z" symbol for the column 

set the other rows to the same "0" symbol in the column 
else // if no "a" symbol exists in any of the rows 
choose one of the "b" symbols that appears 
in one of the rows for the V-attribute and 
set the other rows to that "b" symbol in the column 
until no changes to S 

At the end, if there exists a row with all "a" symbols then D is 
lossless otherwise D is a lossy decomposition 
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Testing for lossless decomposition property(3/6) 

R = (rollNo, name, advisor, advisorDept, course, grade) 

FD's = { rollNo — > name; rollNo — > advisor; advisor — > advisorDept 

rollNo, course — > grade} 
D : { Rj = (rollNo, name, advisor), R 2 = (advisor, advisorDept), 

R 3 = (rollNo, course, grade) } 
Matrix S : (Initial values) 





rollNo 


name 


advisor 


advisor 
Dept 


course 


grade 


R 1 


a 1 


a 2 




b 14 


b 15 


b 16 


R 2 


b 21 


b 22 




a 4 


b 25 


b 26 


R 3 


a 1 


b 32 


b 33 


b 34 


a 5 


a 6 
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Testing for lossless decomposition property(4/6) 

R = (rollNo, name, advisor, advisorDept, course, grade) 

FD's = { rollNo — > name; rollNo — > advisor; advisor — > advisorDept 

rollNo, course — > grade} 
D : { Rj = (rollNo, name, advisor), R 2 = (advisor, advisorDept), 
R 3 = (rollNo, course, grade) } 



Matrix S : (After enforcing rollNo — > name & rollNo — > advisor) 





rollNo 


name 


advisor 


advisor 
Dept 


course 


grade 


Ri 


a 1 


a 2 




b 14 


b 15 


t»l 6 


R 2 


b 21 


b 22 




a 4 


t>25 


b 26 


R 3 


a 1 


D 32 a 2 




b 34 


a 5 


a 6 
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Testing for lossless decomposition property(5/6) 

R = (rollNo, name, advisor, advisorDept, course, grade) 

FD's = { rollNo — > name; rollNo — > advisor; advisor — > advisorDept 

rollNo, course — > grade} 
D : { Rj = (rollNo, name, advisor), R 2 = (advisor, advisorDept), 
R 3 = (rollNo, course, grade) } 



Matrix S : (After enforcing advisor — > advisorDept ) 





rollNo 


name 


advisor 


advisor 
Dept 


course 


grade 


Ri 


a 1 


a 2 






b 15 


bi 6 


R 2 


b 21 


b 22 




a 4 


b 25 


b 26 


R 3 


a 1 


D 32 a 2 






a 5 


a 6 



No more changes. Third row with all a symbols. So a lossless join. 
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Testing for lossless decomposition property(6/6) 

R - given schema. F - given set of FDs 

The decomposition of R into R 1? R 2 is lossless wrt F if and only if 
either R x Pi R 2 — > (R x - R 2 ) belongs to F + or 

Rj n R 2 -> (R 2 - Rj) belongs to F + 

Eg. gradelnfo (rollNo, studName, course, grade) 

with FDs = {rollNo, course — > grade; studName, course — > grade; 

rollNo — > studName; studName —> rollNo} 
decomposed into 

grades (rollNo, course, grade) and studlnfo (rollNo, studName) 
is lossless because 

rollNo — > studName 
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A property of lossless joins 

D x : (R 1? R 2 ,. . R K ) lossless decomposition of R wrt F 
D 2 : (R n , R i2 ,. . R ip ) lossless decomposition of R { wrt Fj = 7i R (F) 
Then 

D — (Rj, R 2 , • • • ? Rj-i? Rn? Ri2? "-> ^ip? ^i+iv • -> ^k) i s a 

lossless decomposition of R wrt F 
This property is useful in the algorithm for BCNF decomposition 
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Algorithm for BCNF decomposition 

R - given schema. F - given set of FDs 
D = {R} // initial decomposition 

while there is a relation schema R { in D that is not in BCNF do 
{ let X -> A be the FD in R { violating BCNF; 
Replace R { by R n = R { - {A} and R i2 = X U {A} in D; 

} 

Decomposition of R { is lossless as 

R n n R i2 = X, R i2 - R n = A and X — ► A 

Result: a lossless decomposition of R into BCNF relations 



Prof P Sreenivasa Kumar 
Department of CS&E, IITM 



41 



Dependencies may not be preserved (1/2) 

S ^ T y D 

Consider the schema: townlnfo (stateName, townName, distName) 
with the FDs F: ST — > D (town names are unique within a state) 

D^S 

Keys: ST, DT. - all attributes are prime 

- relation in 3NF 
Relation is not in BCNF as D — > S and D is not a key 
Decomposition given by algorithm: R^ TD R 2 : DS 
Not dependency preserving as 7i R (F) = trivial dependencies 

Tt^F) = {D - S} 

Union of these doesn't imply ST — > D 

ST — > D can't be enforced unless we perform a join. 
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Dependencies may not be preserved (2/2) 

Consider the schema: R (A, B, C) 

with the FDs F: AB -> C and C B 
Keys: AB, AC - relation in 3NF (all attributes are prime) 

- Relation is not in BCNF as C — > B and C is not a key 

Decomposition given by algorithm: R^ CB R 2 : AC 
Not dependency preserving as 7i R (F) = trivial dependencies 

tu^F) = {C - B} 
Union of these doesn't imply AB — > C 

All possible decompositions: {AB, BC}, {BA, AC}, {AC, CB} 
Only the last one is lossless! 

Lossless and dependency-preserving decomposition doesn't exist. 
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Equivalent Dependency Sets 

F, G - two sets of FDs on schema R 

F is said to cover G if G £ F + (equivalently G + c F + ) 

F is equivalent to G if F + = G + (or, F covers G and G covers F) 

Note: To check if F covers G, 

it's enough to show that for each FD X — > Y in G, Y Q X + p 



Prof P Sreenivasa Kumar 
Department of CS&E, IITM 



44 



Canonical covers or Minimal covers 

It is of interest to reduce a set of FDs F into a "standard" form 
F' such that F' is equivalent to F. 

We define that a set of FDs F is in ' minimal form' if 

(i) the rhs of any FD of F is a single attribute 

(ii) there are no redundant FDs in F 

that is, there is no FD X — > A in F 

s.t (F - {X — > A}) is equivalent to F 
(iii) there are no redundant attributes on the lhs of any FD in F 

that is, there is no FD X — > A in F s.t there is Z C X for which 
F - {X -> A} U {Z A} is equivalent to F 

Minimal Covers 

useful in obtaining a lossless, dependency-preserving 
decomposition of a scheme R into 3NF relation schemas 
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Algorithm for computing a minimal cover 

R - given Schema or set of attributes; F - given set of fd's on R 
Step 1: G:=F 

Step 2: Replace every fd of the form X — > AJA2A3. . .A k in G 
byX-» A x ;X-> A 2 ;X-» A 3 ; ... ; X -> A k 

Step 3 : For each fd X — > A in G do 
for each B in X do 

if A G (X - B) + wrt G then 

replace X -> A by (X - B) -> A 

Step 4: For each fd X — > A in G do 

if(G-{X^A}) + = G + then 
replace G by G - { X — ► A} 
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3NF decomposition algorithm 

R - given Schema; F - given set of fd's on R in minimal form 

Use BCNF algorithm to get a lossless decomposition D = (R 1? R 2 ,. . .,R k ) 
Note: each R { is already in 3NF (it is in BCNF in fact!) 

Algorithm: Let G be the set of fd's not preserved in D 
For each fd Z — > A that is in G 

Add relation scheme S = (B 1? B 2 , . . ., B S ,A) to D. // Z = {B 1? B 2 , . . BJ 

As Z — > A is in F which is a minimal cover, 

there is no proper subset X of Z s.t X — > A. So Z is a key for S! 
Any other fd X — > C on S is such that C is in {B 1? B2> . . ., BJ . 

Such fd's do not violate 3NF because each Bj's is prime a attribute! 
Thus any scheme S added to D as above is in 3NF. 

D continues to be lossless even when we add new schemas to it! 
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Multi-valued Dependencies (MVDs) 

studCourseEmail( rollNo,courseNo,emailAddr ) 

a student enrolls for several courses and has several email addresses 
roll No — > course No ( read as rollNo multi-determines courseNo ) 

If (CS05B007, CS370, shyam@gmail.com) 

(CS05B007, CS376, shyam@yahoo.com) appear in the data then 

(CS05B007, CS376, shyam@gmail.com) 
(CS05B007, CS370, shyam@yahoo.com) 

should also appear for, otherwise, it implies that having gmail 
address has something to with doing course CS370 ! ! 

By symmetry, rollNo — > emailAddr 
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More about MVDs 

Consider studCourseGrade( rollNo,courseNo , grade) 

Note that rolINo — >— > courseNo does not hold here even though 
courseNo is a multi-valued attribute of student 

If (CS05B007, CS370, A) 

(CS05B007, CS376, B) appear in the data then 

(CS05B007, CS376, A) 

(CS05B007, CS370, B) will not appear ! ! 
Attribute 'grade' depends on (rollNo,courseNo) 

MVD's arise when two unrelated multi- valued attributes of an 
entity are sought to be represented together. 
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More about MVDs 

Consider 

studCourseAdvisor( rollNo,courseNo , advisor) 

Note that rollNo — > courseNo holds here 

If (CS05B007, CS370, Dr Ravi) 
(CS05B007, CS376, Dr Ravi) 
appear in the data then swapping courseNo values 
gives rise to existing tuples only. 

But, since rollNo — > advisor and (rollNo, courseNo) is the key, 
this gets caught in checking for 2NF itself. 
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Alternative definition of MVDs 

Consider R( X,Y,Z ) 

Suppose that X — >— > Y and by symmetry X — >— > Z 
Then, decomposition D = (XY, XZ) should be lossless 
That is, for any instance r on R, r = TT X y( r ) * TT X z( r ) 
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MVDs and 4NF 

An MVD X — > Y on scheme R is called trivial if either 
Y^X or R = XU Y. Otherwise, it is called nontrivial. 

4NF: A relation R is in 4NF if it is in BCNF and for every 
nontrivial MVD X — >— > A, X must be a superkey of R. 

studCourseEmail( rollNo,courseNo,emailAddr ) 
is not in 4NF as 

rolINo — > courseNo and 
rolINo — ► emailAddr 

are both nontrivial and rolINo is not a superkey for the 
relation 
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